Research at the intersection of machine learning, programming languages, andsoftware engineering has recently taken important steps in proposing learnableprobabilistic models of source code that exploit code's abundance of patterns.In this article, we survey this work. We contrast programming languages againstnatural languages and discuss how these similarities and differences drive thedesign of probabilistic models. We present a taxonomy based on the underlyingdesign principles of each model and use it to navigate the literature. Then, wereview how researchers have adapted these models to application areas anddiscuss cross-cutting and application-specific challenges and opportunities.
展开▼